Maximum Negentropy Beamforming

نویسندگان

  • Kenichi Kumatani
  • John McDonough
  • Dietrich Klakow
  • Philip N. Garner
  • Weifeng Li
چکیده

In this paper, we address an adaptive beamforming application based on the capture of far-field speech data from a single speaker in a real meeting room. After the position of a speaker is estimated by a speaker tracking system, we construct a subband-domain beamformer in generalized sidelobe canceller (GSC) configuration. In contrast to conventional practice, we then optimize the active weight vectors of the GSC so as to obtain an output signal with maximum negentropy (MN). This implies the beamformer output should be as non-Gaussian as possible. For calculating negentropy, we consider the Γ and the generalized Gaussian (GG) pdfs. After MN beamforming, Zelinski post-filtering is performed to further enhance the speech by removing residual noise. Our beamforming algorithm can suppress noise and reverberation without the signal cancellation problems encountered in the conventional adaptive beamforming algorithms. We demonstrate this fact through experiments on acoustic simulations. Moreover, we demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on the Multi-Channel Wall Street Journal Audio Visual Corpus (MC-WSJ-AV), a corpus of data captured with real far-field sensors, in a realistic acoustic environment, and spoken by real speakers. On the MC-WSJ-AV evaluation data, the delay-and-sum beamformer with post-filtering achieved a word error rate (WER) of 16.5%. MN beamforming with the Γ pdf achieved a 15.8% WER, which was further reduced to 13.2% with the GG pdf, whereas the simple delay-and-sum beamformer provided a WER of 17.8%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Online Maximum Kurtosis Beamforming

In prior work, the current authors investigated the use of optimization criteria for beamforming that exploit the non-Gaussianity of human speech. In particular, we examined beamforming algorithms designed to maximize the kurtosis or negentropy of the subband output of a generalized sidelobe canceller. These techniques, while effective, require making multiple passes through the data, and hence...

متن کامل

On Hidden Markov Model Maximum Negentropy Beamforming

In prior work, we developed a beamforming algorithm intended for automatic recognition of speech data captured with an array of distant microphones. In addition to enforcing a distortionless contraint in a desired direction, we adjusted the sensor weights so as to maximimize a negentropy criterion. Negentropy is a measure of how non-Gaussian the probability density function (pdf) of a random va...

متن کامل

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition

This paper presents a new microphone-array post-filtering algorithm for distant speech recognition (DSR). Conventionally, post-filtering methods assume static noise field models, and using this assumption, employ a Wiener filter mechanism for estimating the noise parameters. In contrast to this, we show how we can build the Wiener post-filter based on actual noise observations without any noise...

متن کامل

Modelling the nonstationarity of speech in the maximum negentropy beamformer

State-of-the-art automatic speech recognition (ASR) systems can achieve very low word error rates (WERs) of below 5% on data recorded with headsets. However, in many situations such as ASR at meetings or in the car, far field microphones on the table, walls or devices such as laptops are preferable to microphones that have to be worn close to the user’s mouths. Unfortunately, the distance betwe...

متن کامل

Microsoft Word - CONTENTS-NOVEMBER06

This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by nonGaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008